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1 Introduction 



It has been a long standing conjecture in formal language theory that all regular 
languages are Church-Rosser congruential. The class of Church-Rosser congruen- 
tial languages was introduced by McNaughton, Narendran, and Otto in 1988 [S]. 
A language L is Church-Rosser congruential, if there exists a finite confluent, and 
length-reducing semi-Thue system S such that L is a finite union of congruence 
classes modulo S. One of the main motivations to consider this class of languages 
is that the membership problem for L can be solved in linear time; this is done 
by computing normal forms using the system S", followed by a table look-up. For 
this it is not necessary that the quotient monoid A* / S is finite, it is enough that 
L is a finite union of congruence classes modulo S. It is not hard to see that 
{a"6" I n G N} is Church-Rosser congruential, but {aJ^lf' \ m,n Efi and m> n} 
is not. This led the authors of [8] to the more technical notion of Church-Rosser 
languages; this class of languages captures all deterministic context-free languages. 
For more results about Church-Rosser languages see e.g. [21 El EH fT3] . 

From the very beginning it was strongly believed that all regular languages are 
Church-Rosser congruential in the pure sense. However, after some significant 
initial progress [9l [TOl HH [12], [13] there was some stagnation. 

Before 2011 the most advanced result was the one announced in 2003 by Rein- 
hardt and Therien p[3]. According to this manuscript the conjecture is true for all 
regular languages where the syntactic monoid is a group. However, the manuscript 
has never been published as a refereed paper and there are some flaws in its pre- 
sentation. The main problem with [13] has however been quite different for us. 
The statement is too weak to be useful in the induction for the general case. So, 
instead of being able to use [TB] as a black box, we shall prove a more general 
result in the setting of weight-reducing systems. This part about group languages 
is a cornerstone in our approach. 

The other ingredient to our paper has been established only very recently. Know- 
ing that the result is true if the the syntactic monoid is a group, we started looking 
at aperiodic monoids. Aperiodic monoids correspond to star-free languages and 
the first two authors together with Weil proved that all star-free languages are 
Church-Rosser congruential [3|. Our proof became possible by loading the induc- 
tion hypothesis. This means we proved a much stronger statement. We showed 
that for every star-free language L 'O A* there exists a finite confluent semi-Thue 
system S O A* x A* such that the quotient monoid A*/S is finite (and aperiodic), 
L is a union of congruence classes modulo S, and moreover all right-hand sides of 
rules appear as scattered subwords in the corresponding left-hand side. We called 
the last property subword-reducing, and it is obvious that every subword-reducing 
system is length-reducing. 

We have little hope that such a strong result could be true in general. Indeed 
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here we step back from subword-reducing to weight-reducing systems. 

We prove in Theorem |6] the following result: Let L O A* he a regular language 
and ||a|| e N \ {0} be a positive weight for every letter a E A (e.g., ||a|| = \a\ = 1). 
Then we can construct for the given weight a finite, confluent and weight-reducing 
semi-Thue system S ^ A* x A* such that the quotient monoid A*/S is finite and 
recognizes L. In particular, L is a finite union of congruence classes modulo S. 

Note that this gives us another characterization for the class of regular languages. 
By Corollary [7] we see that a language L C A* is regular if and only if L is 
recognized by a finite Church- Rosser system 5* with finite index. As a consequence, 
a long standing conjecture about regular languages has been solved positively. 

2 Preliminaries 

Words and languages Throughout this paper, A is a finite alphabet. An element 
of A is called a letter. The set A* is the free monoid generated hj A. It consists 
of all finite sequences of letter from A. The elements of A* are called words. The 
empty word is denoted by 1. The length of a word u is denoted by \u\. We have 
\u\ = n for u = ai ■ ■ ■ a„ where Oj G A. The empty word has length 0, and it is 
the only word with this property. The set of word of length at most n is denoted 
by A-"', and the set of all nonempty words is A'^. We generalize the length of a 
word by introducing weights. A weighted alphabet {A, ||-||) consists of an alphabet 
A equipped with a weight function ||-|| : A — t- N\ {0}. The weight of a letter a E A 
is ||a|| and the weight \\u\\ of a word m = ai ■ ■ ■ a„ with G A is ||ai|| -!-■■■ + ||a„||. 
The weight of the empty word is 0. The length is the special weight with ||a|| = 1 
for all a E A. A word u is a factor of a word v if there exist p,q E A* such that 
puq = V, and m is a proper factor of v if pq 1. The word m is a prefix of v if 
uq = V for some g G A*, and it is a suffix oi v ii pu = v for some p E A* . We say 
that M is a factor (resp. prefix, resp. suffix) of if there exists n G N such that u 
is a factor (resp. prefix, resp. suffix) of f". Two words u,v E A* are conjugate if 
there exist p,q E A* such that u = pq and v = qp. An integer m > is a period 
of a word m = ai ■ ■ ■ a„ with Oj G A if Oj = ctj+m for all 1 < i < n — m. A word 
u E A'^ is primitive if there exists no v E A'^ such that u = for some integer 
n > 1. It is a standard fact that a word u is not primitive if and only = puq 
for some p,q E A~^. This follows immediately from the result from combinatorics 
on words that xy = yx if and only if x and y are powers of a common root; see 
e.g. [7J Section 1.3]. 

A monoid M recognizes a language L C A* if there exists a homomorphism cp : 
A* ^ M such that L = if~'^(f{L). A language L C A* is regular if it is recognized 
by a finite monoid. There are various other and well-known characterizations of 
regular languages; e.g., regular expressions, finite automata or monadic second 



3 



order logic. Regular languages L can be classified in terms of structural properties 
of the monoids recognizing L. In particular, we consider group languages; these 
are languages recognized by finite groups. 



Semi-Thue systems A semi-Thue system over A is a subset S C A* x A*. In 
this paper, all semi-Thue systems are finite. The elements of S are called rules. 
We frequently write i ^ r for rules {i,r). A system S is called length-reducing if 
we have \i\ > \r\ for all rules £ — )■ r in S". It is called weight-reducing with respect 
to some weighted alphabet {A, ||-||), if > \\r\\ for all rules £ — )■ r in S". Every 
system 5* defines the rewriting relation A* x A* by setting u =^ v if there 

exist p, q,i,r & A* such that u = piq, v = prq, and £ — )■ r is in S*. 

By =^ we mean the reflexive and transitive closure of By <^=^ we mean 
s s s 

the symmetric, reflexive, and transitive closure of We also write u <^= v 

whenever v =^ u. The system 5* is confluent if for all u v there is some w 

such that u =^ w <= v. It is locally confluent if for all v <^= u =^ v' there 

s s s s 

exists w such that v =^ w <^= v'. If 5* is locally confluent and weight-reducing 
for some weight, then S is confluent; see e.g. PQ E]. Note that u =^ v implies 

that > for weight-reducing systems. The relation C A* x A* is 

a congruence, hence the congruence classes [u]s = {v E A* \ u v} form a 

monoid which is denoted by A*/S. The size of A*/S is called the index of S. 
A finite semi-Thue system S can be viewed as a finite set of defining relations. 
Hence, A*/S becomes a finitely presented monoid. By IRR5'(A*) we denote the 
set of irreducible words in A*, i.e., the set of words where no left-hand side occurs 
as a factor. 

Whenever the weighted alphabet (A, ||-||) is fixed, a finite semi-Thue system 
S C A* X A* is called a weighted Church-Rosser system if it is finite, weight- 
reducing for (A, ||-||), and confluent. Hence, a finite semi-Thue system is a 
weighted Church-Rosser system if and only if (1) we have \\i\\ > \\r\\ for all rules 
£ — )■ r in S* and (2) every congruence class has exactly one irreducible element. 
In particular, for weighted Church-Rosser systems S, there is a one-to-one cor- 
respondence between A*/S and IRR5(y4*). A Church-Rosser system is a finite, 
length-reducing, and confluent semi-Thue system. In particular, every Church- 
Rosser system is a weighted Church-Rosser system. A language L C A* is called 
a Church-Rosser congruential language if there is a finite Church-Rosser system S 
such that L can be written as a finite union of congruence classes [u]s- 

Definition 1. Let (p : A* ^ M be a homomorphism and let S he a semi-Thue 
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system. We say that if factorizes through S if for all u,v & A* we have: 

u <(=^ V implies (p{u) — (p{v). 

S 

Note that if is a semi-Thuc system and (p : A* ^ M factorizes through S, 
then the following diagram commutes: 

A*/S 
A* '-^ M 

Here, Tr{u) — [u]s is the canonical homomorphism and ip{[u]s) — <p{u). 

3 Finite Groups 

Our main result is that every homomorphism (p : A* ^ M to finite monoid M fac- 
torizes through a Church- Rosser system S. Our proof of this theorem distinguishes 
whether or not M is a group. Thus, we first prove this result for groups. Before 

we turn to the general case, we show that for some particular groups, proving the 
claim is easy. The techniques developed here will also be used when proving the 
result for arbitrary finite groups. 

3.1 Groups without proper cyclic quotient groups 

The aim of this section is to show that finding a Church- Rosser system is very easy 
for many cases. This list includes systems of all finite (non-cyclic) simple groups, 
but it goes far beyond this. Let </? : A* — )■ G be a homomorphism to a finite 
group, where (A, ||-||) is a weighted alphabet. This defines a regular language 
Lq = {w E A* \ ^p{w) = 1}. Let us assume that the greatest common divisor 
gcdlllwll I w e Lq} is equal to one; e.g. {6, 10, 15} C {\\w\\ \ w e Lq}. Then 
there are two words u,v & Lq such that — \\v\\ — 1. Now we can use these 
words to find a constant d such that all g & G have a representing word Vg with the 
exact weight \\vg\\ = d. To see this, start with some arbitrary set of representing 
words Vg. We multiply words Vg with smaller weight with u and words Vg higher 
weights with v until all weights are equal. 

The final step is to define the following weight-reducing system 

So — {w — )■ Vip(^yj) \ w E A* and d < \\w\\ < d + max{||a|| | a e A}^ . 

Confluence of Sg is trivial; and every language recognized by </? is also recognized 
by the canonical homomorphism A* — )• A* / Sq- 
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Now assume that we are not so lucky, i.e., gcd {\\w\\ \ w G Lq} > 1. This means 
there is a prime number p such that p divides \\w\\ for all w G Lq- Then, the 
homomorphism of A* to Z/pZ defined by a \\a\\ mod p factorizes through ip 
and JjjpJj becomes a quotient group of G. This can never happen if G is simple 
and non-cyclic, because a simple group does not have any proper quotient group. 
But there are many other cases where a natural homomorphism A* ^ G for some 
weighted alphabet (A, ||-||) satisfies the property gcd{||w|| | w G Lq\ = 1 although 
G has a non-trivial cyclic quotient group. Just consider the length function and a 
presentation by standard generators for dihedral groups or the permutation 
groups Sn where n is odd. 

For example, let G = Dq = S3 he the permutation group of a triangle. Then G 
is generated by elements r and p with defining relations 

= = 1 and rpr = p^. 

The following six words of length 3 represent all six group elements: 

1 = P^ P = = Tpr, T = rp = p'^r, rp^. 

The corresponding monoid {p, r}*/^^ has 15 elements. 

It is much harder to find a Church-Rosser system for the homomorphism (p : 
{a, 6, c}* —7- Z/3Z where v?(a) = ip{h) = (f{c) = 1 mod 3. In some sense this 
phenomenon suggests that finite cyclic groups or more general commutative groups 
are the obstacle to find a simple construction for Church-Rosser systems. 

3.2 The general case for group languages 

In this section, we consider arbitrary groups. We start with some simple properties 
of Church-Rosser systems. Then, in Theorem 0, we state and prove that group 
languages are Church-Rosser congruential. 

Lemma 2. Let {A, ||-||) be a weighted alphabet, let (i G N, and let S C A* x A* be 
a weighted Church-Rosser system such that IRR5(74*) is finite. Then 

Sd = {uiv — 7- urv I u,v E A'^ and £ — t- r G S*} 

is a weighted Church-Rosser system satisfying: 

1. IKRsA^*) finite. 

2. All words of length at most 2d are irreducible with respect to Sd- 

3. The mapping [u\s^ ^ [u\s for u E A* is well-defined and yields a surjective 
homomorphism from A* / Sd onto A* / S. 
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Proof. First, one shows that local confluence of S transfers to local confluence of 
Sd. For W and 'El' note that IRRs,(A*) = A^^d y j^d . . A'^. The 

remaining proof is straightforward and therefore left to the reader. □ 

Lemma 3. Let {A, ||-||) be a weighted alphabet and let A C A'^ such that all words 
in A have length at most t. Then, for every n > 1, the set of rules 

T = {(5*+" 6^ \ 6 e A, 6 is primitive} 

yields a weighted Church-Rosser system. 

Proof. Every rule in T is weight-reducing. Thus it suffices to show that T is locally 
confluent. Let 5, 5 G A be primitive with \5\ > \6\ and suppose x^*"*"" = 5^^^y. If 
5*"*"" is a suffix of (5*1/, then 5^^^ is a prefix of and the two T-rules 5*"'"" — )■ 5* 
and 5*"*"" — )■ 5* can be applied independently of one another. Thus we can assume 
1 5*+" I > \5^y\. In particular, 5* is a factor of 5^ . Note that > Thus \5\ is a 
period of 5. 

Let us first consider the case \5\ > \6\. Since 6 is primitive, \6\ cannot be a 
divisor of \6\. In particular, we have \6\ > 2. Suppose \6\ = 2. Then 5 = {ab)"^a 
for a,b & A and some m > 1. We conclude that the suffix a6 or the prefix 6a of 
5^ is a factor of 5^. Since both words a6 and 5a have a factor aa and \6\ = 2, 
this contradicts 6 being primitive. Therefore, we can assume |5| > 3 and hence, 
1 5* I > \6^\. It follows that (5^ is a factor of and is a period of 5^. By shifting 
the prefix 6 of 6^ by this period, we can write 6"^ = p6q with p,q E A'^ and |p| = \5\. 
We conclude that 6 is not primitive, which is a contradiction. 

Let now \6\ = \6\. In this case, the words 6 and 6 are conjugate. Therefore, 
applying one of the rules 5*"*'" — t- 5* and 5*^" — > 5* yields the same word. □ 

Lemma 4. Let A C 5e a set of words such that all words in A have length at 
most n. If u E ^>2n ^^j. Q factor of some 5^ for 5 E A, then there is a proper 
factor V of u which is also not a factor of some 5^ for 5 E A. 

Proof. Assume that such a factor f of m does not exist. Let u = awb for a,b E A. 
Then aw is a factor of and wb is a factor of 6'~^ for some 6,6' E A. Let p = |5| 
and q = \6'\. Now, p is a period of aw and g is a period of wb. Thus p and q are 
both periods of w. Since \w\ >2n— l>p + q — gcd(p, q), we see that gcd(p, q) 
is also a period of w by the Periodicity Lemma of Fine and Wilf [.7, Section 1.3]. 
The {p + l)-th letter in aw is a. Going in steps gcd(p, q) to the left or to the right 
in w, we see that the (g + l)-th letter in aw is a. Thus awb is a factor of 6'~^, which 
is a contraction. □ 

We are now ready to prove the main result of this section: Group languages are 
Church-Rosser congruential. An outline of the proof is as follows. By induction 
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on the size of the alphabet, we show that every homomorphism (p : A* ^ G 
factorizes through a weighted Church-Rosser system S with finite index. Remove 
some letter c from the alphabet A. This leads to a system R for the remaining 
letters B. Lemma |2] allows to assume that certain words are irreducible. Then 
we consider K = IRIiR{B*)c which is a prefix code in A*. We consider K as 
a new alphabet. Essentially, it is this situation where weighted alphabets come 
into play because we can choose the weight of K such that it is compatible with 
the weight over the alphabet A. Over K, we introduce two sets of rules Ta and 
Tq. The TA-rules reduce long repetitions of short words A, and the T^-rules have 
the form uuu — )■ ujVgU. Here, VL is some finite set of markers and G is 
such a marker. The word Vg is a normal form for the group element g. The 
T^-rules reduce long words without long repetitions of short words. Then we 
show that Ta and are confiuent and that their union has finite index over K* . 
Here, the confiuence of the TA-rules is Lemma [31 The confiuence of the To-rules 
relies on several combinatorial properties of the normal forms Vg and the markers 
Vt. Using Lemma HJ we see that all sufficiently long words are reducible. Since 
by construction all rules in T = Ta U Tq are weight-reducing, the system T is 
a weighted Church-Rosser system over K* with finite index such : K* — )■ G 
factorizes through T. Since K A* , we can translate the rules ^ — )■ r in T over 
K* to rules c£ — )■ cr over A* . This leads to the set of T'-rules over A* . The letter c 
at the beginning of the T'-rules is require to shield from i?-rules. Finally, we show 
that = -R U T' is the desired system over A* . 

Theorem 5. Let {A, ||-||) he a weighted alphabet and let (p : A* ^ G he a homo- 
morphism to a finite group G. Then there exists a weighted Church-Rosser system 
S with finite index such that (p factorizes through S . 

Proof. In the following n denotes the exponent of G; this is the least positive 
integer n such that (7" = 1 for all g E G. The proof is by induction on the size of 
the alphabet A. If A = {c}, then we set S = {c" — )■ 1}. Let now A = {oq, . . . , a^, c} 
and let qq have minimal weight. We set B = A\{c}. Let 

n+[j/sj 

7i — (^i mod s ^■ 

Since A and {aoc, . . . , a^c, c} generate the same subgroups of G and since every 
element ajc G G occurs infinitely often as some 7^, there exists m > such that 
for every g E G there exists a word 

with Ui > satisfying (p{vg) = g and ||t>g|| — \\vh\\ < n \\ao\\ for all g,h E G. The 
latter property relies on ||7o|| + ||ao|| = ||7s|| and pumping with 7q and 7" which 
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both map to the neutral element of G: Assume \\Vg\\ — \\Vh\\ > '^■||ao|| for some 
g,h & G. Then we do the following. All Vg with maximal weight are multiplied 
by 7q on the left, and for all other words Vh the exponent of 7^ is replaced by 
Us + n. After that, the maximal difference — \\vh\\ has decreased at least by 
1 (and at most by n ||ao||)- We can iterate this procedure until the weights of all 
Vg differ less than n ||ao||. Let 

r = {70, • • • ,7m} 

be the generators of the Vg. By induction there exists a weighted Church- Rosser 
system R for the restriction ip : B* —> G satisfying the statement of the theorem. 
By Lemma El we can assume T C IRRj^(i?*) c. Thus Vg e 1RRr{A*) for all g e G. 
Let 

K = IRRr{B*)c. 

The set A' is a prefix code in A*. We consider K as an extended alphabet and its 
elements as extended letters. The weight of m G i^' is its weight as a word over 
A. Each 7j is a letter in K. The homomorphism ip : A* ^ G can be interpreted as 
a homomorphism if : K* — )■ G; it is induced by u t— ?■ ip{u) for u & K. The length 
lexicographic order on B* induces a linear order < on \RRii{B*) and hence also 
on K. Here, we assume ao < ■ ■ ■ < a^. The words Vg can be read as words over the 
weighted alphabet {K, ||-||) satisfying the following five properties: First, Vg starts 
with the extended letter 79. Second, the last two extended letters of Vg are 7^70- 
Third, all extended letters in Vg are in non- decreasing order from left to right with 
respect to <, with the sole exception of the last letter 70 which is smaller than 
its predecessor 7^- The fourth property is that all extended letters in Vg have a 
weight greater than n ||ao||. And the last important property is that all differences 
ll^gll " \\'^h\\ are smaller than n ||ao||- Let 

A = [5eK+\ 5eK 01 \\5\\ <n||ao||}. 

Note that A is closed under conjugation, i.e., if G A for u, f G K* , then vu G A. 
We can think of A as the set of all "short" words. Choose t > n such that all 
normal forms Vg have no factor 5*"''"' for 5 G A and such that ||c*|| > for all 
u G K'^"'. Note that c G A has the smallest weight among all words in A. 

The first set of rules over the extended alphabet K deals with long repetitions 
of short words: The A-rules are 

Ta = 1^*+" 5* I 5 G A and (5 is primitive} . 

Let F C K* contain all words which are a factor of some 6^ for 5 G A and let 
J C be minimal such that K*JK* = K*\F. By Lemma HI we have J C /^^n 
In particular, J is finite. Since J and A are disjoint, all words in J have a weight 
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greater than n ||ao||. Let VL contain all w G J such that u G TK* implies u = 77' 
for some 7 > 7', i.e., 

= J n {w G K* I u ^ TK* OT u = 77' for some 7 > 7'} . 

As we will see below, every sufficiently long word without long A-repetitions con- 
tains a factor G fi. 

Claim 1. There exists a bound t' G N such that every word u G K* with >t' 
contains a factor u & Q or a factor of the form 5*"^" for 5 G A. 

Proof of ClaimUi Let t" = {t + n + 2) ■ max{||f || G N | f G K}. First, suppose 
u G K*\K*TK* and > t". If u is a factor of then 5"+'^ is a factor of u since 
\\S\\ < max{||f II G N I f G K}. Thus we can assume u G K* \ F. By definition of 
J, the word u contains a factor u E J. We have 00 E Q because u (and thus u) 
has no factor in F. 

If u G K*b~fK* for 6 G \ F and 7 G F, then u contains a factor w = 67 G fi. 
Similarly, if m G K^'yy'K* for 7, 7' G F and 7 > 7', then m contains a factor 
a; = 77' G fl. Thus, if w G K*TK*, then we can assume u = 'ji^^ - ■ ■ 'ji^u' with 

• 7i^. G F and 7n < ■ ■ • < 7ifc , and 

• m' ^ K*TK* and ||m'|| < t". 

We set t' = {t + n-l)- |F| ■max{||t;|| G N | w G Tj + l + t". If ||m|| > t', then A; > 
(t + n — 1) ■ |F| + 1. By the pigeon hole principle, there exists 7 G {7,1, . . . , 7ij.} C A 
such that 7*+" is a factor of u. This completes the proof of Claim [1] o 

Since A is closed under factors, u contains no factor of the form 5*+" for 5 G A 
if and only if m G IRIiT^{K*). In particular, it is no restriction to only allow 
primitive words from A in the rules T^. Every sufficiently long word u' can be 
written as u' = Ui ■ ■ ■ Uk with ||Mj|| > t' and k sufficiently large. Thus, by repeatedly 
applying Claim [H there exists a non-negative integer dn such that every word 
u' G IRRt^IK*) with ||m'|| > tQ contains two occurrences of the same u E Q which 
are far apart. More precisely, u' has a factor uuu with ||m|| > \\vg\\ for all g EG. 

This suggests rules of the form uuu uj v^(^u) ^] but in order to ensure conflu- 
ence we have to limit their use. For this purpose, we equip Q with a linear order ^ 
such that 7m 7o is the smallest element, and every element in f2 fl -ft'"'' 70 is smaller 
than all elements in \ -ft""*" 70. By making to bigger, we can assume that every 
word u' with ll-u'H > tn contains a factor ujuu such that 

• Ih-^ll > W'^gW for all g E G, and 

• for every factor u' E Q of u uu we have u' ^ u. 

The following claim is one of the main reasons for using the above deflnition of the 
normal forms Vg, and also for excluding all words u E TK* in the definition of Q 
except for u = '-ff' E F^ with 7 > 7'. 
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Claim 2. Let u,uj' eQ and g E G. IfuVgUE K*u'K*, then u' ^ u. 

Proof of Claim AH normal forms Vg have 7m7o as a suffix. In addition, the 
word 7m7o is the only element in VL which is a factor of some Vg for g E G. The 
reason is that all other letters in Vg are in non- decreasing order whereas all 77' G Q 
are in decreasing order. In particular, if 7^70 Vg 7m7o £ K*u'K* for u' G Q, then 

= 7m7o, i-e., 7m7o is the only factor of 7m7o^^s7m7o which is in Q. 

Let now u = 670 for b E K \ {70}- Note that u E Q and that all elements 
in f2 n K^'-fo have this form. Then the set of factors of uVgU which are in Q is 
{7m7o,w}. Since 7m7o is the smallest element with respect to ^, each of them 
satisfies the claim. 

Next, suppose u G K^b for 6 G K \ {70}- Then the set of factors of uVgU which 
are in Q is {7m70) ^7oj Since every element ending with 70 is smaller than any 
other element in Q, the claim also holds in this case. This completes the proof of 
Claim El o 

We are now ready to define the second set of rules over the extended alphabet 
K. They are reducing long words without long repetitions of words in A. We set 



T' 



oouoo ^ oov, 



\\i^v>{u)\\ < < tn and 

uuu has no factor u' E Q with u -< u' 



Whenever there is a shorter rule in Tq U Ta then we want to give preference to 
this shorter rule. Thus the f2-rules are 



Tn 



G T' 



e ^r' eT'U Ta 



there is no rule 
such that i' is a proper factor of 



Let now 



T = Ta U Tn 



Claim 3. The system T is locally confluent over K* . 

Proof of Claim\^ The system Ta is confluent by Lemma |3l Suppose we can apply 
two rules i ^ r E Tq and i' r' E T/^. Then i' is not a factor of i. Let i = ujuuj. 
Since u is not a factor of it is possible to ffist apply £ — )■ r and then apply 
i' — )■ r' . Moreover, by choice of d we have \\uj\\ < \\r'\\. Thus we also can ffist 
apply £' — > r' and then i ^ r. 



Hue IRRr^iK*) andn 



then V G IRRt^{K*) by definition of the normal 



forms Vg and the set Q. Thus, it remains to show that T^ is locally confluent 
on IRR7-^(i^'*). By minimality of J, no a; G f2 is a proper factor of another word 
u' G Q. Let uuoj — )■ r and uj'u'uj' — )■ r' be two fi-rules with u 7^ uo' . By construction 
of T/2, the left sides of both rules can overlap at most min , \uj'\} — 1 positions. 
Thus the two rules can always be applied independently of one another. 
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Let now uuu — )■ uVgUJ and uu'u — >■ uvhoo be two f2-rules. By construction of 
Tq, neither is uu'u a proper factor of uuu nor vice versa. If xu = uy for some 
x,y & with ||x|| < "n. ||ao||, then a; G A and a; is a prefix of x"*" which contradicts 
the definition of J C K* \ F. Therefore, whenever xu = uy for x,y & 
then ||a;|| > n||ao|| and \\y\\ > n ||ao||- Suppose now xuuu = uu'uy = uu"u for 
x,y E . If |x| > I^mI, then the two rules can be applied independently of 
one another. Thus let |x| < \uju\. As seen before, we have > n||ao|| and 
\\y\\ > n||ao||. We will show 

xuVgU =^ uv^{u")^ (^Vhuy. 

Tn Tn 

If xuVgU e K*u'K* OT uvhuy e K*u'K*, then by Claim [2] we have u' ^ u. We 
can write xu = ux' . Since = ||x|| > n ||ao||, we have > n ||ao|| + ||wc,|| > 

\\Vgi\ for every g' G G. This relies on the fact that the weights all normal forms 
Vgt differ less than n ||ao||. This shows that the weight of x'vg is sufficiently high. 
If > t^, then by Claim [T] we have x'vg =^ x" such that < ||a;"|| < 

Tn 

for every g' G G. Therefore, without loss of generality we can assume that the 
weight of x'vg is not too high, i.e., ||a;'fg|| < t^- Since ip{x'vg) = (p{u"), we have 
xuVgU =^ uv^(^u")^- Similarly, uvhuy ^» uv^(^u")^- This completes the proof of 

Tn Tn 

Claim El o 

Since all rules in T are weight-reducing, local confluence implies confluence. 
Moreover, all rules £ — )■ r in T satisfy (f{i) = f{r). We conclude that T is 
a weighted Church- Rosser system such that K*/T is finite and (f : K* — )■ G 
factorizes through T. Remember that every element in K* can be read as a 
sequence of elements in A*. Thus every u G K* can be interpreted as a word 
u & A*. We use this interpretation in order to apply the rules in T to words in 
A*] but in order to not destroy i^'- letters when applying rules in R, we have to 
guard the first i^'-letter of every T-rule by appending the letter c. This leads to 
the system 

T' = {ci^ cr e A* X A* \ i-^r eT}. 
Combining the rules R over the alphabet B with the T'-rules yields 

S = RUT'. 

Since left sides of i?-rules and of T'-rules can not overlap, the system S is confiuent. 
By definition, each S'-rule is weight-reducing. This means that S" is a weighted 
Church-Rosser system. We have 

IRRs(A*) = IRRfl(5*) U IRRK(fi*) ■IRRT'(c(lRRij(S*)c)*) ■IRRk(5*). 

Therefore IRRs'(y4*) and A*/S are finite. Since all rules £ — ?• r in S* satisfy (f{i) = 
ip{r), the homomorphism factorizes through S. □ 
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4 Arbitrary Finite Monoids 



This section contains the main result of this paper. We show that every homomor- 
phism if : A* ^ M to finite monoid factorizes through a weighted Church-Rosser 
system 5* with finite index. The proof rehes on Theorem O and on a construction 
called local divisors. 

4.1 Local divisors 

The notion of local divisor has turned out to be a rather powerful tool when using 
inductive proofs for finite monoids, see e.g. [3llll|5]. The same is true in this paper. 
The definition of a local divisor is as follows: Let M be a monoid and let c G M. 
We equip cM fl Mc with a monoid structure by introducing a new multiplication o 
as follows: 

xco cy = xcy. 

It is straightforward to see that o is well-defined and (cM fl Mc, o) is a monoid 
with neutral element c. 

The following observation is crucial. If 1 G cM fl Mc, then c is a unit. Thus 
if the monoid M is finite and c is not a unit, then \cM Ci Mc\ < |M|. The set 
M' = {x I cx G Mc} is a submonoid of M, and c- : M' — )■ cM fl Mc : x ^ cx is 
a surjective homomorphism. Since (cM fl Mc, o) is the homomorphic image of a 
submonoid, it is a divisor of M. We therefore call (cM fl Mc, o) the local divisor 
of M at c. 

4.2 The main result 

We are now ready to prove our main result: Every homomorphism : A* — )■ M to 
a finite monoid factorizes through a weighted Church-Rosser system S with finite 
index. The proof uses induction on the size of M and the size of A. If f{A*) is 
a group, then we apply Theorem [5l and if (p{A*) is not a group, then we find a 
letter c G A such that c is not a unit. Thus in this case we can use local divisors. 

Theorem 6. Let {A, ||-||) be a weighted alphabet and let (p : A* M be a ho- 
momorphism to a finite monoid M. Then there exists a weighted Church-Rosser 
system S of finite index such that if factorizes through S . 

Proof. The proof is by induction on (|M| , \A\) with lexicographic order. If (f{A*) 
is a group, then the claim follows by Theorem [51 If ip{A*) is not a group, then 
there exists c G A such that ip{c) is not a unit. Let B = A \ {c}. By induction on 
the size of the alphabet there exists a weighted Church-Rosser system R for the 
restriction ip : B* ^ M satisfying the statement of the theorem. Let 

K = IRRr{B*)c. 
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We consider the prefix code K as a weighted alphabet. The weight of a letter 
uc & K is the weight ||mc|| when read as a word over the weighted alphabet 
{A, IHI). Let Mc = ip{c)M n Mlp{c) be the local divisor of M at ip{c). We let 
ip : K* — )• Mc be the homomorphism induced by il){uc) = (f{cuc) for uc G K. By 
induction on the size of the monoid there exists a weighted Church-Rosser system 
T C K* X K* for ip satisfying the statement of the theorem. Suppose ipi^) = ip{r) 
for r G K* and let i = UiC- ■ -UjC and r = ViC- ■ -VkC with Ui,Vi G IRRij(i?*). 
Then 

(p{ci) = ip{cUic) O ■ ■ ■ O ip^CUjC) 
= ip{uic) O ■ ■ ■ O 1p{UjC) 
= Tpi^tj = ip{r) = ip[cr). 

This means that every T-rule £ — >■ r yields an invariant rule ci ^ cr. Thus we 
can transform the system T C K* x K* for ip into a system T' C A* x A* for y9 by 

T' = [d ^ cr e A* ^ A* \ i ^ r eT} . 

Since T is confiuent and weight-reducing over K* , the system T' is confluent and 
weight-reducing over A* . Combining R and T' leads to 

5 = U T'. 

The left sides of a rule in R and a rule in T' cannot overlap. Therefore, 5* is a 
weighted Church-Rosser system such that ip factorizes through A* jS. Suppose 
that every word in IRRr(if*) has length at most k. Here, the length is over 
the extended alphabet K. Similarly, let every word in IRRj:j(i?*) have length at 
most m. Then 

IRR5(A*) C [uocui ■ ■ ■ cuk'+i I Ui G 1RRr{B*), k' < k} 

and every word in IRRs'(yl*) has length at most {k + 2)m. In particular IRR5(A*) 
and A*/S are finite. □ 

The following corollary is a straightforward translation of the result in Theorem O 
about homomorphisms to a statement about regular languages. 

Corollary 7. A language L <Z A* is regular if and only if there exists a Church- 
Rosser system S of finite index such that L = UneiNs- 

Proof. If L is regular, then there exists a homomorphism : A* ^ M recognizing 
L. By Theorem E] there exists a finite Church-Rosser system 5* of finite index such 
that (f factorizes through S. The latter property implies ip~^{x) = [ju€ip-^{x)bAs 
every x G M. Thus L = IJzg</3(l) ~ U«glNs- The converse is trivial. □ 

In particular, we see that all regular languages are Church-Rosser congruential. 
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